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THE DIRICHLET PROCESS WITH LARGE 
CONCENTRATION PARAMETER 

LUAI AL LABADI AND MAHMOUD ZAREPOUR 

Abstract: Ferguson's Dirichlet process plays an important role in nonparametric Bayesian inference. 
Let P a be the Dirichlet process in R with a base probability measure H and a concentration parameter 
a > 0. In this paper, we show that v / a(P a ((— oo, t]) — H ((-co, t]) ) converges to a certain Brownian 
bridge as a — > oo. We also derive a certain Glivenko-Cantelli theorem for the Dirichlet process. Using 
the functional delta method, the weak convergence of the quantile process is also obtained. A large 
concentration parameter occurs when a statistician puts too much emphasize on his/her prior guess. This 
scenario also happens when the sample size is large and the posterior is used to make inference. 

Key words and phrases: Bayesian nonparametric, Brownian bridge, Dirichlet process, quantile process, 
weak convergence. 



1 Introduction 

In nonparametric Bayesian inference, we need to place a prior on an infinite dimensional space 
such as the space of probability measures. Ferguson (1973) used a Dirichlet process (a normalized 
gamma process) as a prior on this space. For k > 2, we say that the random vector (Y\, . . . , Yk) 
has the Dirichlet distribution with parameters (oi, . . . , a^), where > for all i, if it has the joint 
probability density function 

rfc=i"i) * 
f(yi,...,y k ) = -V— — ^ YlvT 1 Js(i/i, ••-,!/*), 

where S = . . . , y k ) : Vi > 0, ££ =1 yi = l\ and T{x) = J °° t^e^dt, x > 0. We denote 
by D(ax, ... ,ak) the Dirichlet distribution with parameters oi, . . . , a^. 

The Dirichlet process was defined in Ferguson (1973) as follows: let (3L,A) be an arbitrary 
measurable space and H be a probability measure on (3L,A). Let a > be arbitrary. A random 
probability measure P a = {P a (A)} A&A is called a Dirichlet process on (X,A) with parameters a 
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and H, if for any finite measurable partition {Ax, . . . , A^} of X, the joint distribution of the vec- 
tor (P a (Ai), . . . P a (Ak)) has the Dirichlet distribution with parameters (aH(Ai), . . . , aH(Ak)). 
The subscript a is added since in the forthcoming sections we will study the asymptotic behav- 
ior of the random probability measure P a for large values of a. We assume that if H(Aj) = 0, 
then P a (Aj) = with probability one. We write P ~ DP(a, H) to denote the Dirichlet pro- 
cess with parameters a and H. Throughout this paper, we use the same letter for the probability 
measure and its corresponding cumulative distribution function, i.e. P a {t) = P a ((— oo,t]) and 
H(t) = H ((— oo, t]). We also assume that the cumulative distribution function H is continuous. 

For any A G A, P a (A) has a Beta distribution with parameters aH{A) and a(l — H{A)). Thus, 

E(P a (A))=H(A) and Var(P a (A)) = ~ H{A)) . (1.1) 

1 + a 

Furthermore, for any two sets Ai and Aj S A, it follows from the properties of a Dirichlet distribu- 
tion that (Wilks 1963, page 177) 

E(P (A)P (A y )) = j^-HiAjHiAj) (1.2) 

The probability measure H is called the base measure of P a . Clearly, form dl-lb - H plays the 
role of the center of the process, while a can be viewed as the concentration parameter. The larger a 
is, the more likely it is that the realization of P is close to H. Specifically, for any fixed set A G A 
and e > 0, we have P a (A) A H(A) as a — > oo since 



e 2 (l + a) 



In this paper, "— >•" denotes the convergence in probability. 



An attractive property of the Dirichlet process is its conjugacy property. That is, if Xi, . . . , X n 
is a random sample from P a ~ DP(a, H), then the posterior distribution of P a given X±, . . . , X n 
coincides with the distribution of the Dirichlet process with parameter measure a*H*, where 

a* = a + n and H* = ^—H + H StejA , (i. 3) 

a + n a + n n 

Here and throughout the paper 5x denotes the Dirac measure at X, i.e. Sx(A) = 1 if X G A and 
otherwise. 



LUAIAL LABADIAND MAHMOUD ZAREPOUR 



3 



Notice that the posterior base distribution H* is a convex combination of the base distribution 
and the empirical distribution. The weight associated with the prior base distribution H is propor- 
tional to a, giving another reason to call a the concentration parameter. The weight associated with 
the empirical distribution is proportional to the number of observations n. The posterior base distri- 
bution H* approaches the prior base measure H for large values of a. On the other hand, for small 
values of a, H* is close to the empirical process. 

The Dirichlet process has the following series representation: 



where (9i)i>i is a sequence of independent and identically distributed (i.i.d.) random variables with 
common distribution H and (Jj)j>i are random variables chosen to be independent of (#«)i>i and 
such < Ji < 1 and Yli^i Ji = 1 almost surely. For several representations for (Jj)j>i, see, 
for example, Ferguson, Phadia, and Tiwari (1992). It follows from (11.41) that any realization of the 
Dirichlet process must be a discrete probability measure. 

Sethuraman and Tiwari (1982) studied the convergence and tightness of Dirichlet processes as 
the parameters are allowed to converge in a certain sense. They showed that as the concentration 
parameter a — > 0, the Dirichlet process converges to a degenerate probability measure at a particular 
point in 3t randomly chosen from H. 

Let y be a. collection of Borel sets in R. For large values of the concentration parameter a, we 
study the weak convergence of the centralized and scaled Dirichlet process defined by 



oo 




(1.4) 



i=l 



D a (S) = V^(P a (S)-H(S)), 5ey. 



(1.5) 



We also derive the limiting distribution of the Dirichlet quantile process 




(1.6) 



where in general the inverse of a distribution function F is given by F 1 (t) = inf {x : F(x) > t} . 
Moreover, a certain Glivenko-Cantelli theorem for the Dirichlet process for large values of concen- 
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tration parameter is obtained. 

For the Dirichlet posterior processes with parameters given in (11.31 ). the concentration parameter 
o* — > oo whenever n — > oo (n is the sample size). Lo (1987) studied completely the behavior of 
the process 

dn, a {t) = ^ (P* a (i) - F n (t)) , tem, 

as the sample size n gets large, where P* a is the posterior of the Dirichlet process P a given the data 
and F n is the empirical distribution function. Using this result, Lo (1987) gave an asymptotic justi- 
fication of the use of Bayesian bootstrap and provided large sample Bayesian bootstrap probability 
intervals for the mean, the variance, and bands for the distributions. 

2 Asymptotic Properties of the Dirichlet process 

In this section, we study the asymptotic properties of P a as a — > oo, where P a ~ DP (a, H). Since 
H is strictly increasing, we have 

Oi < t if and only if H(e t ) < H(t). 

Thus, 

oo oo 

P a ((-oo,*]) =Y,J i 8e i ((-oo,*]) = JiS H{6i) ((0, #(*)]) . 

i=l i=l 

Throughout this paper, "=" denotes equality in distribution. Since {6i)i>\ is a sequence of i.i.d. ran- 
dom variables with continuous distribution H, for i > 1, Ui = H(6i) where {U-i}^^ a sequence 
of i.i.d. random variables with a uniform distribution on [0, 1]. Hence, 

oo 

P a ((-oo,i])^J i( ^((0, #(*)]). 
1=1 

Therefore, without loss of generality, we only consider the case when H (t) = t (i.e., (6{)i>i is a 
sequence of i.i.d. random variables with uniform distribution on [0, 1]). Hence, the process in (11.51 ) 
reduces to 



D a (S) = y/E(P a (S)-X(S)), 



(2.1) 
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where A is the Lebesgue measure on [0, 1]. Hereafter, unless otherwise stated, P a ~ DP(a, A), 
where A is the Lebesgue measure on X = [0, 1]. 

We now recall the definition of a Brownian bridge indexed by y . A Gaussian process {B X (S) : S E y} 
is called a Brownian bridge if 

E[B x (S)}=0 and Cov (B\(Si), B\(Sj)) = X(Si Ci Si) — X(Si)X(Si), (2.2) 

where S, Si, Sj E 5? (Massart 1989). 

The next lemma gives the limiting distribution of the process d2.il ) for any finite Borel sets 
Si, ■ ■ ■ , Sk E y. The proof of the lemma for k = 2 is given in the appendix and it can be general- 
ized easily to the case of arbitrary k. In this paper, "-4" denotes the convergence in distribution. 

Lemma 1. Let D a be as defined in 12. iD . Then, as a — > oo, for any fixed sets Si , . . . , S^ E 

(D a (Si), D a (S 2 ), D a (S k )) A (B x (Si),B x (S 2 ), B x (S k )) , 

where B x is the Brownian bridge indexed by y with the mean and the covariance structure as given 
in iHp . 

Remark 1. The convergence obtained in Lemma Q] is called convergence in total variation. This 
type of convergence is stronger than convergence in distribution (Billingsley 1999, page 29). 

Remark 2. It follows from Lemma Q] that, for any fixed Borel set S E y, 

D a (S) = V^(Pa(S) - A(5)) 4 B X (S), 

where B X (S) is distributed as N(0, A(5)(l - A(5))). 

Lemma Q] proves that the finite-dimensional distributions of the process D a converge to the 
corresponding finite-dimensional distribution of B x . The next theorem shows that the process D a 
converges to the process B x on D[0, 1] with respect to the Skorokhod topology. 

Theorem 1. Let D a be as defined in &2.1\) . Then , as a — > oo, we have: 

VS(P a (-)-A(.))4B A (.) 
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on D[0, 1] with respect to the Skorokhod topology, where B\ is a Brownian bridge. 

Proof. From Lemma[[]and Theorem 13.5 of Billingsley (1999) we need only to prove that for any 

< ti < t < t 2 < 1, 



E 



\Pa(t) ~ Paih)^ \P a (t 2 ) - P a {t)\ 2 ? < \F(t 2 ) - F(t 



for some /3>0,a>l/2, and a nondecreasing continuous function F on [0, 1]. Take (3 = 1/2, 
a = 1, and F(t) = t to show that: 

E [(P a (t) - P a (h)) (P a (t 2 ) - P a (t))} < -4- (i 2 - h) 2 . (2.3) 

a + 1 

Observe that, 

(P a (t) - Pa(h),P a (t 2 ) - P a (t)) ~ -D (oA(ti,i], aA(i, t 2 ], a (1 - A(ti, t] - X(t, t 2 ))) 
From (11.21 ) we have: 

£[(P (t)-P a (ii))(P (i 2 )-P (t))] = -^.ACti^JA^tz] 

= -4r(*-ti)(t2-t). 

a + 1 

Since ^ < t < t 2 , (12.31 ) follows. This completes the proof of the theorem. □ 

As in Ferguson (1973), under the squared error loss and Dirichlet prior, the no data estimate (or 
the posterior estimate) for the distribution is the prior distribution H. Under the absolute deviation 
loss, the estimate will be the median of the Dirichlet process with the prior distribution of H. 
Therefore, the Dirichlet quantile process plays a role in estimation. The following corollary derives 
the asymptotic behavior of the Dirichlet quantile process defined by (11.61 ) when the concentration 
parameter a is large. 

Corollary 1. Let < p < q < 1, and H be a continuous function with positive derivative h on 
the interval — e, H~ l (q) + e] for some e > 0. Let Q a be the Dirichlet quantile process 

defined in (1.61) . where P a ~ DP (a, H). Then, as a — > oo, we have: 

in D[p, q\. That is, the limiting process is a Gaussian process with zero-mean and covariance func- 
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Proof. By Theorem Q] the process y/a (P a — H) converges in distribution to the process B>h = 
B\(H) = B\ o H. Almost all sample paths of the limiting process are continuous on the interval 
[H^ip) - e, H~ 1 (q) + e] . By Lemma 3.9.23. page 386 of Van der Vaart and Wellner (1996), the 
inverse map H (->• H^ 1 is Hadamard-differentiable at H tangentially to the subspace of functions 
that are continuous on this interval. By the functional delta method (Theorem 3.9.4 page 374 of Van 
der Vaart and Wellner (1996)) we have 

d BxoHoH-H-) = B X (.) 
Wa{) KH~ l {-)) h{H-^(-)) 

in D\p, q]. This completes the proof of the corollary. □ 

Remark 3. Paralleling Remark 1 of Bickel and Freedman (1981), if H~ 1 (0+) > — oo and 
< oo and h is continuous on [i?~ 1 (0+), H -1 (l)], the conclusion of the corollary holds in 
D [H^ 1 (0+) , H~ l (1)] . For example, if H is a uniform distribution on [0, 1] , then the convergence 
holds in D[0, 1]. More generally, we may have one end of the support finite and the other infinite 
and a modified form of Corollary 1 still holds. Also from the result of Theorem [U, we can derive 
asymptotic properties of any Hadamard-differentiable functional of the DP(a, H) as a — > oo. 

Example 1 (Median). Let M a be the median of P a and m be the median of H (i.e. P" 1 (0.5) = M a 
and 7^ _1 (0.5) = m). From Corollary Q] we have: 

^(M„- ro) 4iv(o,^), 

where h = H' . Note that, the asymptotic distribution of the median for Dirichlet process coincide 
with that of the sample median. 

Example 2 (Interquantile Range). Similar to Example 1, let IQR = Q^ a — Q\ A , where Q^^ and 
Qi : a are the third and the first quartiles of P a (i.e. P~ 1 (0.75) = Q 3: a and P~ 1 (0.25) = Qi, a )- Let 
qs and q\ be the third and the first quartiles of H. From Corollary [T] a simple calculation shows 
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This gives with the asymptotic distribution of the sample interquartile range. 

In the next theorem we establish the Glivenko-Cantelli theorem for the Dirichlet process. In 
this paper, "^4 " denotes the almost sure convergence. 

Theorem 2. LetP a ~ DP(a,H). Then, 

sxip\P a (x) - H(x)\ a 4'0, 

as a — > oo. 

Proof. From Donoho and Liu (1988), 

( M M f .w- gW ir < £ Wx) _ (2 . 4) 

Notice that Pa(^) H(x), as a — )• oo, and (P a (^) — H{x)) 2 is dominated by 1. Thus, by the 
dominated convergence theorem (which remains valid for convergence in probability (Royden 1968, 
page 92)), we obtain that the right hand side of ( 12.41 ) converges to zero. □ 

When the concentration parameter is large, the Dirichlet process and its corresponding quantile 
process share many asymptotic properties with the empirical process and the quantile process. 
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Appendix: Proof of Lemma 1 for k = 2 

Assume that Si n S 2 = 0. (The general case when Si and S 2 are not necessarily disjoint follows 
from the continuous mapping theorem). Note that 

(P a (S 1 ),P a (S 2 ),l-Pa(Si)-P a (S 2 )) ~ D(aA(Si),aAOS 2 ), 

o(l - A(5i) - X(S 2 ))) 

Set Xj !(J = P a {Si) and Zj = A(S'j), i = 1, 2. Thus, the joint density function of Pi and P 2)a is: 

/^..^(xi.xa) = - — - ^ (a) — xf ^xf - X! - xaya-'i-ftO" 1 . 

r(aZi)r(a<2)r(a(l - Zi - Z 2 )J 

The joint probability density function of D\^ a = y/a{X\ a — l\) = yfa{P a (S\) — \{S\)) and 

D 2 ,a = V^(X2,a ~ h) = \fa(P a (S 2 ) - A(S 2 )) is: 

r («) ( 1 

jD lia ,D 2ia {yi,y2) = rr j vF7 : , F , t: — : r-TT I -7=1/1 + * 



ar(aZi)r(aZ 2 )r( a (i-Zi-z 2 )) VV^ 1 

1 / + \ o(l-Ii-J„)-l 

— y 2 + Z 2 1 - - Zi - Z 2 



a 



By Scheffe's theorem (Billingsely 1999, page 29), it is enough to show that: 



/■Pi,a,J>a,«(yi»ift) f(VhV2) = 27r | S |i/2 ex P 1 (l/iIte) T /2}, (3-1) 



where S 



Zi(l-Zi) -ZiZ 2 
-ZiZ 2 Z 2 (l-Z 2 ) 



Use Stirling's formula (Wilks 1963, page 177) 

T(z) « v^z 2- ^ -2 , as z ->■ 00, 
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where we use the notation f(z) « g[z) as z — > oo if Hindoo j++ = 1, to get: 



lip fD ha ,D 2 , a (yi,y2) 



— lim 

27T a— >oo 



aii — 1 



a« 2 -l 



ail— 2 ; a '2 _ 5 
1 Z 2 

\a(l-Ji-J 3 )-l 

1 " vs^ 1 - 7s y2 " ^ " Z2 



(1 - Zi - Z2) (1 ~' 1-fa)o ~' 



1 

^^^^=^^^^= lim 

2^^1/2(1-/1-/2) a ^°° 



T^ 1 + '1 



aii — 1 



7^2 + fa 



ah — I / ^ o(l— li— '2)— 1 

1 " Va^l " 7S y2 ~ h ~ h 



if*- 1 ^ - h - l 2 )a(l-h-h)-l 



^^^^=^^^^= lim 

2^/1/2(1-/1-/2) a ^°° 

ah 



1 + 



yi 



1 + 



2/2 

\/a/2 



1 



2/1 +2/2 



a/i 



a(l-h-Z 2 ) 



27T V /o-llO"22(l - P12) 



exp < lim a In v a > 

La— ^00 J 



(3.2) 



where 



011 = /i(l - h), V22 = h0--h), P12 



(l-/i)(l-/ 2 ) 



(3.3) 



and 



1 + 



2/1 
\fal\ 



1 + 



2/2 

\/a/s 



2/1 +2/2 



^(1-/1-/2) 



l-Ji-Ja 



Observe that, 



lim alnu a = lim — — 

a— >oo a— s>oo 1/a 



/l 



+ (1-/1-/2) In 1 



2/1 + 2/2 



^(1-/1-/2) 
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Using L' Hospital's rule we obtain lim^oo a In v a equals to: 

(l-h-k) , - {V1+V2) 



lim 

a— >oo 



lim — 

a— >co 2 



lim — 

a— >oo 2 



11 2ha 3 / 2 L2 2l 2 a'i/ 2 



1 + 



yi 

hy/a 

hyi 



+ 



'2(1-/ 1 -Z 2 ) a 3/ 2 



1 + 



2/2 



+ 



hVa, 



1 



2/1+2/2 



(1 -h - h){y\ +V2) 



+ 



hVa + Vi hVa + V2 (1 - h - h)Va - (yi + y 2 )_ 

fmyg + (i - h)yj 

(hy/a + yi) ((1 - h - h)Va - (yi + 2/2)) 

^22/1^2 + (1 - h)yl 



(hy/a + y 2 ) ((l - ii - fe)\/a - (yi + y 2 )) 
- h)y\ + 2Zi/ 2 yiy 2 + - h)y\ 



+ 



(1 - /!)(! - Z 2 ) 
2(l-/i -h) 

2yw2 

(1-/0(1-/2) 
1 



yi 



+ 



in 



s]h{\ - h) J V V^(i-/ 2 ) 



2(1 - Ph) 



+ 



/02~2 



2pi2 



yi 



yi 



where an, 022 and p 12 are defined in ( I3.3I ). The proof follows by using ( I3.2I ). 



